Questionable Raters + Low Agreement + Inadequate Sampling

نویسنده

  • Robert J. Harvey
چکیده

Several studies have made positive claims regarding the validity, reliability, and utility of the Occupational Information Network (O*NET). In this first of three studies questioning such claims, I focused on the root cause of many problems regarding O*NET: i.e., the practice of rating overly abstract and heterogeneous occupational units (OUs), collecting ratings on OUs that exhibit substantial interrater disagreement, computing aggregate profiles that eliminate true within-OU variance, then viewing correlations involving the aggregates as proof of reliability and validity. I used actual O*NET data and Monte Carlo simulated raters to demonstrate how aggregation bias (James, 1982) can produce the illusory appearance of high correlations at the OU-aggregate level, even when little or no true agreement exists among raters. Because the cross-group convergence studies cited as proof of O*NET's reliability and validity reported correlations that were comparable to (and in some cases, far lower than) results produced from totally random ratings, O*NET's standing with respect to reliability and validity is questionable. I conclude that when evaluating occupational data quality, a necessary (but not sufficient) condition is that interrater agreement levels must exceed results seen in benchmark comparisons using data of known-unacceptable quality (e.g., when random ratings are combined with a few does-not-apply agreements). However, when vague, unverifiable, hypothetical constructs are directly rated using single-item tests (as in O*NET), even perfect interrater agreement is not sufficient to justify inferences of reliability and validity (although the failure to exceed random-rater benchmarks makes a strong statement regarding the lack of quality). Because O*NET’s aggregation bias problems are caused by its overly abstract title taxonomy, the only solution is to start over, and re-populate the database using occupational titles that exhibit far less true withintitle disagreement. Revising the data collection process to rate observable, verifiable job characteristics (and collect data from objective, job-knowledgeable raters) would also help considerably with respect to being able to cite interrater agreement indices in support of an inference of data quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ACSC Indicator: testing reliability for hypertension

BACKGROUND With high-quality community-based primary care, hospitalizations for ambulatory care sensitive conditions (ACSC) are considered avoidable. The purpose of this study was to test the inter-physician reliability of judgments of avoidable hospitalizations for one ACSC, uncomplicated hypertension, derived from medical chart review. METHODS We applied the Canadian Institute for Health In...

متن کامل

Comparison of the validity and reliability of two image classification systems for the assessment of mammogram quality.

OBJECTIVE To compare the reliability and validity of two classification systems used to evaluate the quality of mammograms: PGMI ('perfect', 'good', 'moderate' and 'inadequate') and EAR ('excellent', 'acceptable' and 'repeat'). SETTING New South Wales (Australia) population-based mammography screening programme (BreastScreen NSW). METHODS Thirty sets of mammograms were rated by 21 radiograp...

متن کامل

Comparison between inter-rater reliability and inter-rater agreement in performance assessment.

INTRODUCTION Over the years, performance assessment (PA) has been widely employed in medical education, Objective Structured Clinical Examination (OSCE) being an excellent example. Typically, performance assessment involves multiple raters, and therefore, consistency among the scores provided by the auditors is a precondition to ensure the accuracy of the assessment. Inter-rater agreement and i...

متن کامل

A repeated measures concordance correlation coefficient.

The concordance correlation coefficient is commonly used to assess agreement between two raters or two methods of measuring a response when the data are measured on a continuous scale. However, the situation may arise in which repeated measurements are taken for each rater or method, e.g. longitudinal studies in clinical trials or bioassay data with subsamples. This paper proposes a coefficient...

متن کامل

Inter-rater agreement and reliability of the COSMIN (COnsensus-based Standards for the selection of health status Measurement Instruments) Checklist

BACKGROUND The COSMIN checklist is a tool for evaluating the methodological quality of studies on measurement properties of health-related patient-reported outcomes. The aim of this study is to determine the inter-rater agreement and reliability of each item score of the COSMIN checklist (n = 114). METHODS 75 articles evaluating measurement properties were randomly selected from the bibliogra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009